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ABSTRACT 

The number of comparisons X n used by Quicksort to sort an array of n distinct 
numbers has mean [i n of order n log n and standard deviation of order n. Using different 
methods, Regnier and Rosier each showed that the normalized variate Y n := (X n —/j> n )/n 
converges in distribution, say to Y; the distribution of Y can be characterized as the 
unique fixed point with zero mean of a certain distributional transformation. 

We provide the first rates of convergence for the distribution of Y n to that of Y, 
using various metrics. In particular, we establish the bound 2n~ 1 / 2 in the ^-metric, 
and the rate 0[n e ~^^) for Kolmogorov-Smirnov distance, for any positive s. 
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1 Introduction and summary 

This paper provides the first rates of convergence (as n — * oo) for the distribution of 
the number of comparisons used by the sorting algorithm Quicksort to sort an array 
of n distinct numbers. Quicksort is the standard sorting procedure in Unix systems, 
and has been cited || as one of the ten algorithms "with the greatest influence on the 
development and practice of science and engineering in the 20th century." We begin 
with a brief review of what is known about the analysis of Quicksort and a summary 
of our new results. 

The Quicksort algorithm for sorting an array of n distinct numbers is extremely 
simple to describe. If n = or n = 1, there is nothing to do. If n > 2, pick a number 
uniformly at random from the given array. Compare the other numbers to it to partition 
the remaining numbers into two subarrays. Then recursively invoke Quicksort on each 
of the two subarrays. 

Let X n denote the (random) number of comparisons required (so that Xq = 0). 
Then X n satisfies the distributional recurrence relation 

X n ^X Un ^ + X*_ Un + n-l, n>l, (1.1) 

where = denotes equality in law (i.e., in distribution), and where, on the right, U n is 
distributed uniformly on the set {1, . . . ,n}, X* = Xj, and 

U n ; Xq,... ,X n -i; X ,... ,X n _ 1 

are all independent. 

As is well known and quite easily established, for n > we have 

Un := EI n = 2(n + l)H n — 4n ~ 2nlnra, 

where H n := X^fc=i ^ l ls the n th harmonic number and ~ denotes asymptotic equiv- 
alence. It is also routine to compute explicitly the variance of X n (see Exercise 6.2.2-8 
in0): 

Var X n = In 2 - 4(n + l) 2 H {2) - 2(n + l)H n + 13n = a 2 n 2 - 2nlnn + 0(n) (1.2) 

where Hn^ '■= Y2k=l are the second-order harmonic numbers and 

o- 2 := 7 _ |tt 2 = 0.42. (1.3) 
Consider the normalized variate 

Y n := (X n - n n )/n, n > 1. 
Then ( jl,l| ) implies the recursion 

Yn k l I^Y Un ^ + ^-^Y*_ Un + C n {U n ), n>l, (1.4) 
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with Yq arbitrarily defined (since its coefficient is 0), where on the right, as for X n , 

we have U n ~ unif{l, ... , n} and Y* = Yj, and U n ; Yi, . . . , Y n _i; Y*, . . . , Y*_ x are all 
independent; further, 

C' n (0:= 2 = 1 + ~(w-i+/*n-i-Pn), l<?<n. (1.5) 

Note that E Y n = = EC n (J7 n ). We will see below that if n — >■ oo and i/n [0, 1], 

then C n (i) — > C(u), where 

C(u) :=2«ln« + 2(l-«)ln(l-u) + l, «£ [0,1], 

with the natural (continuous) interpretation C(u) := 1 for u = 0, 1. 

Moreover, Regnier fl5|] and Rosier [^] showed, using different methods, that Y n — ► Y 
in distribution, with Y satisfying the distributional identity 

y 4 c/Y" + (1 - 17)Y* + C(U) (1.6) 
obtained by formally taking limits in (|1.4j), where, on the right, U, Y, and Y* are 



independent, with Y* = Y and C/ ~ unif(0, 1). [Rosier [^] showed further that (1.6) 
characterizes the limiting law C(Y), subject to E Y = and Var Y < oo. For a complete 
characterization of the distributions satisfying (1.6), see @.] 

The purpose of the present paper is to study the rate of convergence of C(Y n ) to 
£(Y), using several different measures of the distance between C(Y n ) and C(Y). 

First, for real 1 < p < oo, let \\X\\ p := ('EX P ) 1 ^ P denote the L p -norm, and let d p 
denote the metric on the space of all probability distributions with finite pth absolute 
moment defined by 

d p (F,G) := mm\\X -Y\\ p , 

taking the minimum over all pairs of random variables X and Y (defined on the same 
probability space) with C(X) = F and £(Y) = G. We will use the fact Q that the 
minimum is attained for each 1 < p < oo by the same X and Y, viz., X := i ? ~ 1 (n) and 
Y := G _1 (u) defined for u in the probability space (0, 1) (with Lebesgue measure). 

We will for simplicity write d p (X,Y) := d p (C(X), C(Y)) for random variables X 
and Y, but note that this distance depends only on the marginal distributions of X 
and Y. 

Rosier [16| showed that d p (Y n , Y) — > as n — > oo for every 1 < p < oo. In Sections ^ 
and H] we will quantify this and show that 

d p (Y n ,Y) =o(n- 1 / 2 ) 

for every fixed p. In the case p = 2we will further show the explicit bound 

d 2 (Y n ,Y) < 2U- 1 ' 2 . 

We do not know whether the n -1 / 2 rate is sharp, although that is widely believed. The 
best lower bound we can show (Section ||) is 

In tl 

d P (Y n ,Y)>c , p>2, 

n 
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with c > independent of p. 

In Section [5] we use these results to bound the Kolmogorov-Smirnov distance d^s(Y n , Y) 
between C{Y n ) and C(Y). We show that 



d K s(Y n ,Y) = In- 



-(1/2) 

for every e > 0. We believe that the rate is in fact 0(n~ 1 /2) i but 

again do not know 
the exact rate. The best lower bound we can prove is c/n with c > 0. 

In Section || we prove a kind of local limit theorem which enables us to approximate 
the density function / of Y. (It was proved by Tan and Hadjicostas [17] that Y has a 
density function; / is bounded and infinitely differentiable by ||.) 

Rosier [16] showed that (for fixed A £ R) the moment generating function values 
E e^ Yn are bounded and thus converge to E e XY . Again we quantify his bounds and give 
in Section [7] explicit bounds, based on Rosler's method. 

In several (but not all) bounds we give explicit numerical values to constants. These 
values are hardly the best possible, but we make some effort to get fairly small values. 
This includes sometimes the use of extensive numerical verifications by computer for 
small n. [All numerical calculations have been verified independently by the two au- 
thors, the (alphabetically) first using Mathematica and the second using Maple.] Such 
arguments could be simplified or omitted at the cost of increasing the constants. 

1.1 Preliminaries 



In order to later estimate C n {i) defined by (1.5) we need some explicit bounds on fj, n . 
First, as mentioned above, 

fi n = 2(n + l)H n - 4n, (1.7) 

which can be rewritten 

li n = 2(n + l)H n+1 - 4n - 2. (1.8) 
Next we use the bounds on the harmonic numbers (see, e.g., Section 1.2.11.2 in |13|) 

Inn + 7 < H n < lnn + 7 + ^, n > 1. (1.9) 
Hence, for n > 1, from ( |1.7| ) 

2(n + l)lnn + (27 - 4)n + 2 7 < fi n < 2(n + l)lnn + (2 7 - 4)n + 27 + 2±1 (1.10) 
and from ( |1.8j ) 

2n In n + (2 7 - 4)n + 2 < < 2n In n + (2 7 - 4)n + 3. (1.11) 
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2 Bounding d 2 (Y n ,Y) 

In this section we prove the following explicit estimate for d 2 (Y n , Y). 
Theorem 2.1. For all n>l, d 2 (Y n ,Y) < 2/y/n. 



Proof. We basically follow the method of Rosier [|16|, making all estimates explicit. We 
study in this paper only properties of the univariate distributions of Y n . We thus take 
the liberty of letting Y n denote any random variable with the appropriate distribution 

[Y n = [X n — fjL n )/n}. We then may choose Yq, Y\, . . . defined on the same probability 
space as Y and such that 

\\Y n -Y\\ 2 = d 2 (Y n ,Y), n>0. 

Further, let (Y* ,Y * ,Y* , . . .) be an independent copy of (Y, Yq, Y\, . . . ) and let U ~ 
unif(0, 1) be independent of everything else. For convenience we write a n := d,2(Y n , Y). 
Observe, by (O), that 



(2.1) 



£ ~ \nU] - 1 n - \nU] 
Y n = Y n := ^fnt/l-i + Y n _ lnU] + C n {\nU\) 



n 



n 



and recall from (|1.6|) that 



Therefore, 



Now 



Y n — Y 



Y = Y := UY + (1 - U)Y* + C{U). 



a 2 n = d 2 2 (Y n ,Y)<E\Y n -Y\ 



\nU] - 1 



Y t 



n 



\nlf]-l 



UY + 



n ~ \nU] ^ 



Y 



n-\nU] 



(1 - U)Y* 



(2.2) 



(2.3) 



+ (C n (\nU]) - C(U)) 
=:W 1 + W 2 + W 3 , 

say. Given U, the random variables W\ and W 2 are independent with zero mean, 
while Ws is a constant. Hence 



E 



Y n -Y 



U 



E ( {W\ + W 2 + W 3 ) 2 \ U)=E (Wx | U) + E (Wi \ U) + W, 



and thus, taking expectations, 



E 



Y n — Y 



Ewf + Ewi + Ewl 



(2.4) 



By symmetry (replacing U by 1 — U), EWf = EWf. We estimate this term by 
conditioning on U, using the independence of U and Y, Yq, .... If U = (k + v)/n, with 
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k 6 {0, 1, . . . , n - 1} and < v < 1, then \nU] = k + 1 and Wi = |(Y fc - Y) - %Y; 
hence Minkowski's inequality yields 

B(Wf\U= (k + v)/n) 1/2 < t\\Y k -Y\\ 2 + %\\Y\\ 2 

k _ I v „ 



Consequently, 



i n—i „i -, n—l „i t , \ 2 

= *(W?\U = (k + v)/n)dv<lY, + > ^ 

= n^Jo {^ ak + ^ Vaka + ^ a ) dv 

= - X, — 4 + — a k<r + • ( 2 - 5 ) 
fc=o v 7 

We postpone the estimation of E , and introduce the notation 

b n := \\W 3 \\ 2 = \\C n (\nU])- C{U)\\ 2 . (2.6) 
Combining (p.3[)-(p^), we obtain our fundamental recursive estimate 



a 2 n < 2E Wf + E Wi 



^ J>l> 2 4 + 5l> afc + S +6 - n - L (2 - 7) 

k=l k=l 

We unwrap this recursion partly, by concentrating on the first sum on the right-hand 
side and regarding the second as known. Thus, writing 

n-l „ 2 



2ct ^ , 2a z , 2 

fc=i 



2 n_1 



we define recursively 

+n 2 y n , n > 1, (2.9) 

fc=i 

and find by (^]) and induction 

n a n < x n , n > 1. 



Now, the recursion (E^9) is easily solved (see, e.g., H), giving 



a n <n x ra = y n + 2—^ ^ (j + + 2) Vv n> 1. (2.10) 



n-l 
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We substitute (|2.8|) , treating the three terms separately, into ( p.lOj ). The first term 
in ( |2.8D yields the sum 



n-l 



E 



r 



2a 



3-1 



>(j + l)(i + 2)f 



fc=i 



n-l 

fc=l k<j<n 
n—1 n—1 

^aka k 
k=l j=k+l 
n-l 



j(j + l)(j + 2) 
1 



fc=i 



i(i + 1) (j + i)(j + 2), 
i i 



(fc + l)(fc + 2) n(n+l) 



and the total contribution 

la \ - n+1 x - 

fc=l fc=l 



1 



1 



(A; + 1)0 + 2) n(n + l) 

n-l 



n + 1 \ 



fc=i 



(fc + l)(fc + 2)" 



The second term in fl2.g| ) yields the sum 



(2.11) 



n-l 

E 



2a 1 2a' 



- (j + l)(j + 2) 3j 



;2 



2a 2 ^ /^L 1_\ _ 2a 2 / 1 _ _L__ 



and the total contribution 



2cr 2 n+1 2cj 2 /l 
+ 2- 



3n 2 n 2 3 V 2 n + 1 



2a 2 , . 2a 2 

l +n + l_ 2 =— . 
6n z in 



Hence we find from (2.10) 

i -i n-l 

9 n + 1 v— v 
at < 2a — 2^ 



' (Jfe+ l)(Jfe + 2) 3n 



2cr 2 l2 
+ -=— + bl + 2 



n+1 



n 



n-l 

S "(fc + l)(fc + 2)' 



We next use the following estimate of b n , whose proof we postpone. 
Lemma 2.2. For n > 1, 

b n := \\C n (\nU]) - C(U)\\ 2 < ( 3 + ^] - < ^ 



V3J n 



Using this lemma in ( |2,13j ), we find in analogy with ( 2.12j ) 



bl + 2^±Y. 



k 2 bl 



< {k + l){k + 2) 



(6.63) 2 44 
< — < — 



n 



(2.12) 



(2.13) 



(2.14) 
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and thus 

9 n|lr^ kab ( 2<7 2 \ 1 

We claim that fl2.15|) implies the sought estimate a n = 0(n -1 / 2 ). Indeed, assume 
that n > 1 and that vl > is a number such that 

a k < A/Vk (2.16) 

for 1 < k < n- 1. Then, using fc + 1 > [fc(/c + 2)] 1 / 2 , 

n-l , n.-l ,1/0 n— 1 1 

y fcfl fc < A y fc/ < A y 1 

£-? (A: + l)(fc + 2) " Z^(fc + i)(A; + 2) - (fe + 2)3/2 

£ ■ 4 r i (7riF =M ( 2 " i/2 - < " +iri/2 )- (2 - i7) 

In particular, for n > 2, 

I y ^ < l 2 A < 2A(n + 1)-V2 (2.18) 

n ^ (/c + l)(fc + 2) ~ n ~ K ' y 1 

k=l 

and thus ( 2.17[ ) yields (trivially for n = 1, too) 

n + 1 y- 1 fcQ fc < 2 i/2 A 
n t^i (k + l)(k + 2) ~ 



Consequently, by ( 2. 15| ) , 



i 2 „ < 2 3/2 aA + 44 + 2— < 2 3 / 2 aA + 45. 
3 



If 2 3 / 2 <t j 4 + 45 < A 2 , which holds for example for A = 8, then this yields na 2 < A 2 , and 
thus ( 2.16 ) holds for k = n, too. By induction, ( |2.16| ) holds for all k > 1, and we have 
proved the explicit estimate 

a n < 4=, n > 1. (2.19) 



This is the desired estimate, apart from the value of the constant. To improve the 
constant, we use numerical calculations by computer. Indeed, for ( [2.6D , 

n filn 



fc n = E / (C(n)-C n (i)) 2 ^ 

/•I n /-i/n n i 

= / C(u) 2 dn-2V C„(i) / C(«) du + V -C, 



(*) 2 
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where F(u) := u 2 lnu — (1 — u) 2 ln(l — u) and C n {i) is given by (1.5); so, given any 
integer N, b n can be computed exactly for n < N. Next, for n = 1, . . . ,N, an upper 
bound a n to a n can be computed recursively from (|2.7| ) or, equivalently, ( 2.13Q , using 
the already computed a^, k < n, to bound at in the right-hand side. (We do not know- 
how to compute a n exactly even for n = 3.) For larger n, we use the estimates ( 2,16 ) 
and Lemma 2.2. 
Let 



Vn 



Vn 



W n 



E 



ka k 



' x (fc + !)(& + 2)' 



E 

k=l 
n 

£ 

k=l 



kak 



(fc + l)(fc + 2)' 
(fc + l)(fc + 2)' 



Then for n > N, arguing as in fl2.17| ), for any A such that ( P-16| ) holds for all k, 

n-l 



< Viv + E 



.4 



fc=iV+l 



(k + 2) 3 / 2 



< Vjv + (iV + 2) 



-1/2 



(n + 1) 



-1/2 



and thus by ([DJD 

^K-i = K_i + ^K-i < Vat + 2A(N + 2)" 1 / 2 . (2.20) 
Similarly, with B := ^3 + ^\ < 44, for n > N, by Lemma we have 

b 1 5pi 5 

+ -> 



n+1 

71 



71—1 t-, „ 71—1 

^ " WN + ^Jk + m + 2) > n^ i{ k + m + 2) 



W N + B 



1 



1 



1 

+ — 



1 



W N + 



B 



B 



N + 2 n + 1 2n n(n + 1) J " " ' N + 2 2n' 
Consequently, ( 2.1 3j ) yields, using Lemma 2.2 again and ( gjQ| ), 



9 n + 1 2a 2 B n + 1 

al < 2a^^V n „ 1 + — + ^+2^^W n -i 



n- 



3n n 2 



< i ^2(7% + 4a,4(iV + 2)- 1 / 2 + ^- + 2W)v + 2£(iV + 2) -1 ^ , 
In other words, ( p. 16 ) holds for k > N, with A replaced by 



n> N. 



A 



N ■' 



2<t 2 \ 1/2 

2aV N + AaA(N + 2)- 1/2 + — + 2W N + 2B(N + 2)- 1 ) . (2.21) 

For N = 100 we find (using Mathematica or Maple), rounded to four decimal places, 
V 100 = 1.1995 and Wi 00 = 0.3466, and thus, taking A = 8 as in p4S|), A i 00 = 2.3332. 
Moreover, the computer verifies that n 1 / 2 a n < 1.7 for n < 100; thus (|2.16| ) holds for all 
k > 1 with ^4 = 2.34. Using this value in ( 2.21| ) we find Aioo = 1-9976, and the theorem 
is proved. □ 
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Remark 2.3. The sequence n l / 2 a n seems to increase slowly. For n = 100 the value is 
(rounded to four decimal places) 1.6018, and hence the bound in Theorem 2A cannot 
be much improved using the present method based on ( |2.7| ) . 

It remains to prove Lemma |2.2j above. 

Proof of Lemma Let I{ := {u : \nu\ = i} = ((i — 1) /n,i/n]. Thus I±, . . . ,I n form 
a partition of (0, 1]. We choose a point U G h for each i (where the bar here indicates 
closure) and define 

C n (u) : = C(t|- ntl -|), 
i.e., C(u) = C(ti) when u £ Ij. By Minkowski's inequality, 

b n < \\C n (\nU])- C n (U)\\ 2 + \\C n (U) - C(U)\\ 2 . (2.22) 
To estimate the second term in ( |2.22| ), note that for u E Jj, 

\C n (u)-C{u)\ = \C{ti)-C{u)\< [ \C'(x)\dx. 

Jh 

The Cauchy-Schwarz inequality yields 



C n (u) - C{u)\ 2 < - f \C'(x)\ 2 dx, ueli 



and thus (for any choice of ij € ij), 



\C n {U)-C{U)\\l = J \Cn{u)-C(u)\ 2 du 

i=l ^ % 
n 

i=i Jii 



i 



/ \C'(x)\ 2 dx. (2.23) 
n Jo 



We have 
and find 



C'{x) = 21na;-21n(l-x) 

/*1 f'l poo 

/ [ln(l - x)] 2 dx = / (lnx) 2 dx= / y 2 e~ y dy = 2 
Jo Jo Jo 



and 



/ [In x] [ln(l - x)]dx = / x k \ lnx\ dx = / 



^1 1 1 1_ 

^k(k + l) 2 ~ ^ \k(k+l) (k + l) 



2 
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consequently, 



\C'(x)\ 2 dx 



(Inx) 2 dx — 8 / [lna;][ln(l — x)] dx 



4vr 2 



Hence ( gJD yields 



1 2ir 

\\C n (U)-C(U)\\ 2 <-\\C'(U)\\ 2 <-^. 



(2.24) 



For the first term in (|2.22|), let us first assume that n > 2. For u 6 Ii we have 



G^nu])-^) = C n (t) - Cfe) 

"n + + Mn-i - Mn) - 2tj In tj - 2(1 - tj)ln(l - U). 



For i < [n/2] we choose ij = i/n. This yields, using flTTTl) and (p^D , 

Cn^-C^) < i[-l + 2ilni + (2 7 -4)i + 3 

+2(n - i + 1) ln(n - ») + (2 7 - 4)(n - ») + 2 7 + 1 + ^ 
-2(n + 1) Inn - (2 7 - 4)n - 2 7 - 2i ln(£) - 2(n - ») ln(^) 



21n(^)+3 + ^ 



< 



2/ 



n ' n—i 



< 



(2.25) 



In the opposite direction, by ( l.llj) and ( |1.10 ), still for i < [n/2], 

C n (i)-C(ti) > i[-l + 2ilni + (2 7 -4)i + 2 

+2(n - i + 1) ln(n - i) + (2 7 - 4)(n - i) + 2 7 
-2(n + 1) In n - (2 7 - 4)n - 2 7 - 1 - i 

-2*ln(i)-2(n-i)ln(V)] 
= ^21n(^)-I]>I[21n(i)-i] 



■1\ 11 > _3 



Consequently, for i < [n/2]. 



|C„(t) - C(U)\ < 3/n. 



(2.26) 



For i > [n/2] we choose instead ti = (i — l)/n = 1 — t n+ \^i. The symmetries of C n 
and C then yield C n (i) — C(ti) = C n (n + 1 — i) — C(t„ + i„j), and since n + 1 — i < n/2, 
(1236; ) shows that |C„(i) - C(*i)| < 3/n for % > [n/2] , too, i.e., ( ^26| ) holds for all z < n. 
In other words, 

|CW(M) " C n (u)\ = \C n {\nu\) - C(t M )| < 3/n 

for all u G (0, 1]; in particular, \\C n (\nU]) - C n {U)\\ 2 < 3/n for all n > 2. This holds 
trivially for n = 1, too, for any choice of t\, and together with (2.22) and ( p. 24 ) yields 
the result. □ 
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Remark 2.4. Define 

c* :=sup{n 1 / 2 d 2 (Y n ,Y) : n > 1}, 
so that, by Theorem |2.l| , c* < 2. Conversely, 

c* > 2 1/2 d 2 (y 2 ,y) = 2 1/2 ||y|| 2 = 0-V2 > 0.9168; 

thus the constant 2 in Theorem ^]l] is no more than about twice the optimal value. 

Although we do not know the exact value of d 2 (Y n ,Y) for any n > 2, one can 
in principle for any n and m compute the exact distributions of Y n and Y m and thus 
d2(Y n , Y m ). We have done this for some m, n < 50 using Mathematica and Maple. The 
results are consistent with a decay of the type d 2 (Y n ,Y) ~ cn -1 / 2 with c « 1, but our 
data are too few to be conclusive. 



3 Bounding d p (Y n , Y) 

In this section we extend Theorem 2A and show that d p (Y n , Y) = 0(n -1 / 2 ) for every p. 
In contrast to the style of Section ^, we will make no attempt to keep constants small, 
nor to keep track of them explicitly. 

Theorem 3.1. For every p > 1, there exists a constant c p < oo such that 

d p (Y ni Y)<c p /Vn~, n>l. 

Proof. Since d p < d q when p < q, it suffices to consider integer p > 2. The case p = 2 
is Theorem |2.1| (with c 2 = 2), so we assume further that p > 3. We use induction on p 
and assume that the result holds for smaller positive integer values of p. 
Let Y, Y n , Y*, Y*, U be as in Section 0, and note that for every p > 1, 

\\Y n -Y\\ p = \\Y: -Y*\\ p = d p (Y n ,Y), n>0, (3.1) 

by the fact [|l| that there is an optimal coupling for <i 2 that is optimal for every d p . 
Using the notation of Section ||, we have, for n > 1, 

d P (Y n , Y) < \\Y n - Y\\ p = ||Wi + W 2 + W 3 \\ p . (3.2) 

We use a simple lemma to estimate this. 

Lemma 3.2. Let Z\, Z 2 , and Z% be three independent random variables, and letp>2 
be an integer. Then 

E \Z X + Z 2 + < E \Z t \P + E |Z 2 |f + (ll^iHp-i + \\Z 2 \\ P -i + ||z 3 y p . 

Proof. By the binomial theorem and independence, 

B\Z 1 + Z 2 + Z 3 \P<B(\Z 1 \ + \Z 2 \ + Z 3 \) p = Y,(. P k J (El^iP) (E|Z 2 | fc ) (e|Z 3 |' 
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If j < p — 1 and < p — 1 we estimate E|Zi| J = \\Zi\\ J , < \\Zi\\ J ± (which holds 
also for j = 0, disregarding the central expression) and similarly E \Z2\ k < 1 1 1 1 1 aim 
E \Z 3 \ l < \\Z3Wp. Hence all terms in the sum, except E \Z\\ P and E \Z2\ P , are bounded by 
the corresponding terms in the trinomial expansion of (||Zi|| p _i + H^Hp-l + ||Z3|| p ) p . 

□ 

Conditional on [/ = u, the three variables W\, W 2 , and W 3 are independent, so the 
lemma is applicable. Fix u £ (0, 1) and let i = \nu\ , so 1 < i < n. Then, given U = u, 
W\ = z^Yi-i — uY and thus, for any q > 1, 

B(\W 1 \< 1 \U = U ) 1/q = Wi^Y^-uYW, 

< ^(y^-y)!!^ -u|||y[|„ 

< ^(y-i,r) + i||F|| 3 . (3.3) 

Similarly, 

E (\W 2 \ 9 I U = nfl* < ^d q {Y n ^ Y) + l\\Y\\ q . (3.4) 

Further, given U = u, W 3 = C n {i) — C(u) is a constant, for which we use the simple 
estimate (from Proposition 3.2 in [|16|) 



\W 3 \ = \C n {\nu\) - C{u)\ < I lnn + O (n" 1 ) = O (r^ 1 / 2 ) . (3.5) 

We first use (^) with q = p—1 together with the induction hypothesis d p -i(Yi-i,Y) < 
c p -i(i — l)™ 1 / 2 , i > 2, to obtain (also for i = 1) 

E (l^r 1 I U = u) 1/{p ~ 1] < c p J-^^ + Vll^ < bin' 1 / 2 , 

where 61, like 62 , &3 , ^4 below, denotes some constant depending on p only. By similar 
argument using (^4) and (3J3), we obtain 

E (IWil*- 1 1 U = «) 1/{p - 1) +E {\W 2 \ p ~ l I U = w) 1/(p - 1) +E (\Wtf I U = u) 1/p < b 2 n~ 1 ' 2 . 
Hence, using (|3.3| ) and ( ^j ) for q = p, too, Lemma |3^ yields 

EflWi + Wb + WaH = < ( l -^ld p (Y^,Y) + b 3 -^ J 



+ f ^d p (Y n ^ t , Y) + 63- V + b\n p l 2 . 
\ n n J 



Taking the average over all u 6 (0, 1) we finally find the recursive estimate 
d p {Y n ,Y) p < B\W 1 + W 2 + W 3 \ P = BB(\W 1 + W 2 + W 3 \ P \U) 

< -E (-^(^,n + &3-J +^ P/2 - (3.6) 

j=0 
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The proof is now completed by another induction, this one on n. Suppose that 
d p (Yj, Y) < cj- 1 / 2 for 1 < j < n - 1. Then yields 

d P (Y n ,Y) p < l^( c jV\- 1 + b 3 n- i y + -%n-r + l%ri-!>/ 2 
n j= i n 

n-l 

< -(c + b 3 ) P Y f /2 n- p + b 4 n- p/2 
n ^— ' 

< 2(c + 6 3 ) p f l x p/2 n- p l 2 dx + b in - p ' 2 

Jo 

2{c+h)P T^ri +h ] n ~ p/2 - (3 - 7) 

Since p > 3, we have ( p /2)+i = -p^ ^ ^> an( ^ thus, if c is sufficiently large, 

4 



P + 2 



(c + b 3 ) p + h<c p . 



For such c, ( |3"77| ) yields < (en 1 ^ 2 ) P , which completes both inductions and 

the proof. □ 

Note that the arguments used above for p > 3 do not work for p = 2, so we need 
both the proof here and the proof in Section |2[ 

4 Lower bounds for d p (Y n , Y) 

We do not know whether the upper bounds 0{n~ 1 / 2 ) proved in the preceding two 
sections are sharp. We give in this section two simple lower bounds. 
First, d p (Y n , Y) = £l(n~ l ) for every p by the following general result. 

Proposition 4.1. Let W, W±,W2, ■ ■ ■ be random variables such that W has an abso- 
lutely continuous distribution while, for each n > 1 and some constant b n , n(W n — b n ) 
is integer-valued. Then, for each 1 < p < oo, d p (W n ,W) = 0(l/n). More precisely, 

limini ndJW n ,W) > ±(p+l)~ 1/p . (4.1) 

n^oo 

Proof. Let V n := {n(W — b n )}, where {x} := x — [x\ denotes the fractional part of x. 
For any coupling of W and W n , 

\W - W n \ = ±\n{W - b n ) - n(W n - b n )\ > ±h(V n ), 

where h(x) := min(x, 1 — x), < x < 1, and thus 

d P (W,W n ) > ±\\h(V n )\\ p . (4.2) 
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We regard V n as a random variable taking values in R/Z = T, and find that its distri- 
bution, v n say, has Fourier coefficients 



where <j) is the characteristic function of W. In particular, |z> n (/c)| = \(J)(27rkn)\. By our 
hypothesis on W and the Riemann-Lebesgue lemma, (j)(x) —* as x — > ±00. Thus, 
for each fixed 7^ 0, z) n (&) -> as n -> 00. This implies that v n converges weakly 

(as measures on T) to the uniform distribution, i.e., V n ^>U where U ~ unif(0, 1). 
Consequently, as n — > 00, 

\\h(V n W p = E -> E = 2 V = 2-f /(p + 1), 

j 

which together with (4.2) leads to (4.1). The proof of the proposition is completed by 

c 

observing d p (W n , W) > for every n, because W n 7^ W. □ 



Note that, in contrast to the asymptotic result ( J4.1| ) , there is no positive lower bound 
to d p (W n , W) for a fixed n without further assumptions. Hence the implicit constant in 
f2(l/n) in the theorem depends on the variables W, W\, .... 

For p > 2 we can improve this lower bound by a logarithmic factor by using the 
known variance of Y n . 

Theorem 4.2. If 2 < p < 00, then 

d P (Y n ,Y) > d 2 (Y n ,Y) = O(i^). 



Proof. Recall that Y and Y n have mean and that Var Y = a 2 while by (|L2|) 

Var Y n = a 2 - 2^ + O^ 1 ) 

and thus 

||y n || 2 = (VarYn) 1 / 2 = a -^ + O^ 1 ). 
Consequently, for the c^-optimal coupling of Y and Y n , by Minkowski's inequality, 

d 2 (Y n , Y) = \\Y n - Y\\ 2 > \\Y\\ 2 - \\Y n \\ 2 = a' 11 -^ + O^ 1 ). 

□ 

We still have a gap between (lnra)/n and n -1 / 2 . 

Remark 4.3. It can be shown that E Y™ = E Y m + O ), n > 2, holds also for 
m = 3, 4, . . . ; cf. the formulas for moments and cumulants by Hennequin [O]. Hence 
we do not get better lower bounds for d p by considering higher moments. 
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5 The Kolmogorov— Smirnov distance 

Recall that the Kolmogorov-Smirnov distance d^s(F,G) between two distributions is 
defined as sup xgR \P(X < x) — P(Y < x)\, when X ~ F and Y ~ G. We will in this 
case also write dKs(X,Y). 

To obtain upper bounds for d,Ks(Y n , Y), we combine the bounds above for d p (Y n , Y) 
with the following simple general result and the fact || that Y has a bounded density 
function. 

Lemma 5.1. Suppose that X and Y are two random variables such that Y is absolutely 
continuous with a bounded density function f. If M := sup ygR \ f(y)\ an d 1 < p < oo, 
then 

d KS (X,Y) < (p+1) 1 /^ (Md p (X,Y)f^ . 

Proof. Consider an optimal d p -coupling of X and Y. Then, for x € R and e > 0, 
denoting the distribution functions of X and Y by Fx and Fy, 

F x (x) = P(X <x) < P(Y <x) + P(x <Y <x + e) + P(Y - X > e) 
< F Y (x) + Me + P{Y - X > e). 

Similarly, 

F Y {x) < P(X < x) + P{x-e < Y < x) +P{X-Y > e) 
< F x {x) + Me + P{X -Y > e). 

Consequently, 

A(x) := \F x (x) - F Y (x)\ < Me + P(\X -Y\>e) 

and thus 

/•oo 

d p (X,Y) p = E\X-Y\ P = / pe^P^X -Y\> e)de 

Jo 

f-A(x)/M 

> / pe p ~ l {A{x)- Me)de 

Jo 

^A{x) p+1 M- p . □ 



P+i 



Theorem 5.2. For every e > ; 

d KS (Y n ,Y) = O (n E -W) . 



Proof. By Y has a bounded density function, so Lemma |5.l| and Theorem |3,l| yield, 
for every fixed 1 < p < oo, 

^Ks(^n,^) = O (d p (Y n ,Yy'^) = O (n-^ +1 )l) . 
The result follows by choosing p so large that 2 (p+i) > \ ~ £ - ^ 
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To get an explicit bound we take p = 2 in Lemma 5.1 and use Theorem 2.1. This 
yields the bound 3 1 / 3 (2Mn~ 1 / 2 ) 2/3 , and we know M < 16 from Theorem 3.3 of H. 
Hence, 

Theorem 5.3. For n > 1, 

d KS {Y n ,Y) < (12M 2 ) 1/3 ™~ 1/3 < (3072/n) 1/3 < 15 ra~ 1/3 . □ 

Numerical evidence (lTj suggests that M < 1, which would give a bound 2.3 n -1 / 3 . 

We conjecture that Theorem [5^ holds with e = 0, too, i.e., that d}ts(Y n ,Y) = 
0(n- 1 / 2 ). Even if this were proved, it is not clear what the right order of decay is; the 
best lower bound we can prove is 0(n _1 ). 

Theorem 5.4. 

Again, the lower bound follows from quite general considerations. In this case we 
use the following lemma. 

Lemma 5.5. Suppose that Y and Z are two random variables such that Y has a con- 
tinuous distribution while a(Z — b) is integer-valued for some real numbers a > and b. 
If o\ := VarZ < oo, then 

d KS (Y,Z) > l/(12aa z + 8). 
Proof. For any i£R and 5 > 0, 

F z (x + 5)- F Y (x + 5)+ F Y (x -5)- F z (x - 8) < 2d KS (Y, Z). 
Letting 8 — > we find, since Y is continuous, 

P(Z = x) <2d KS (Y,Z). 
The result now follows from the following lemma applied to a(Z — b). □ 

Lemma 5.6. If Z is an integer-valued random variable with finite variance a\, then 

supP(Z = n) > l/(6<rz + 4). 



n 



Proof. Let /i:=EZ and m := [§o"z] • By Chebyshev's inequality, 

P(|Z-/i|>m)<4<H<i 
m z 9 2 

and thus 

F(/i - m < Z < fi + m) > 1/2. 

The interval (/x — m, (J, + m) contains at most 2m integers, and thus it must contain an 
integer n such that 

PVZ = n) > J-P(fJ. - m < Z < u + m) > -J- > - — - — -. □ 
v ' - 2m x Am 6a z + 4 
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Proof of Theorem 5.4- We apply Lemma 5.5 with a = n and observe that 

a Yn ■= (Var^) 1 / 2 < a = (VarY) 1 / 2 (5.1) 
and that 12a = 7.8 < 8. Indeed, ( |5.1| ) is trivial for n = 1 or 2 and easily verified for 



3 < n < 6, while for n > 7 it holds because then, by (1.2) and (1.3), 

a 2 -Vary n = -4^ + 4fl + iy^) + 2^^-^ 
b \ n J n z n 



n 



□ 



>-4 £ AT 2 + V 2) + -#n " H 
fc=n+l 

> i (-4 + 8H ( 2 ) + 2tf n - 13) > 0. 

6 Approximating the density of Y 

It was shown in || that the density / of Y is infinitely differentiable, with all derivatives 
rapidly decaying. In particular, the derivative /' is bounded; Theorem 3.3 of Q gives 
the explicit bound 

M' := sup \f'(x)\ < 2466. 

(This is not very sharp; the true value seems to be less than 2.) The bounds above on 
the Kolmogorov-Smirnov distance then imply the following local result. 



Theorem 6.1. For any x £ R and 5 > 0, 



Fn (■£ ~i~ 2 ) Fri (-E 2 ) 



/(*; 



(96M 2 ) 1 / 3 AT' 
- <5nV3 + X ' 



In particular, for any M > M and W > M' , choosing 5 = 5 n := 2 (96M 2 (M')~ 3 ) 1/6 n~ 1/6 
yields 



F n (x + %) - F n (x - %) 



/to 



< (96M 2 (M' 



(6.1) 



The choices M = 16 and M' = 2466 provided by || yield the bound 268 n 1 / 6 
in (|6.1| ). If M = 1 and M' = 2 could be proven to be legitimate, we could reduce the 
bound to 3.03 n- 1 / 6 . 



Proof. By Theorem |5.3j , 

\F n (x + I) - F n (x - I) - {F (x + I) - F (x - < 2d KS (Y n ,Y) < 2(12M 2 ) 
while 



2^-1/3 



F[x + 



F x 



5/2 

1 

-5/2 
5/2 



(f(x + y)-f(x))dy 



< / M'|y| = — «5 2 
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The first estimate follows, and the second is an immediate consequence. 



□ 



Theorem 64 yields a simple method to numerically calculate the unknown density / 
up to any given accuracy. For an application, see (In H, a preliminary version 
of Theorem 5A with larger constants is used.) Note, however, that the convergence 
is slow and that it seems impractical to obtain high precision by this method. Other, 
potentially more powerful, methods to calculate / numerically are discussed in [||. 

Open Problem 6.2. Does a local limit theorem hold in the form that 



nP(X n = k)-f 



nP Y r 



k fi n 



n 



k 



Mr, 



n 



perhaps uniformly in A; £ Z, as n ^ oo? 



7 Bounds on moment generating functions 

Rosier [16 1 proved that the moment generating functions E e XYn are bounded for fixed A, 
and thus E e XYn — ► E e XY as n — > oo. Rosier did not make his estimates explicit, but his 
method can be used to obtain explicit bounds. For the limit variable Y, this was done 
in H, where we obtained by Rosler's method (with some refinements) the following 
explicit estimates for the moment generating function of Y: Let Lq = 5.018 be the 
largest root of e L = 6L 2 ; then 



V>y(A) 



Be XY < < 



3 1.25A 2 
„0.5A 2 



D 12A 2 



A < -0.62, 
-0.62 < A < 0, 
< A < 0.42, 
0.42 < A < L , 
L < A. 



(7.1) 



In particular, E e XY < exp (max (l2A 2 ,2e A )) for all A 6 R. 

The constants in Q7.1| ) are not sharp, but the doubly exponential growth as A — ► +cxd 
is correct: it was also shown in || that i^y(X) > exp (7A~ 1 e A ) for all large A whenever 
7 < 2/e. 

In this section we will establish similar bounds for Ee A1 ". For simplicity we first 
consider the slight shrinkage 



Y, 



n 



-Y n 



X„ 



n + 1 n + 1 



of Y n ; in particular, lo := — Mo = 0. We then have the following simple result. 

Theorem 7.1. Ee Ay ™ f Ee Ay as n | oo. Hence, for any n > 0, Ee Ay " < Be XY , and 
in particular the upper bounds on E e XY in ( [iM| ) above apply also to E e XYn . 
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Proof. It is well known that the number X n of Quicksort comparisons has the same 
distribution as the internal path length of a random binary search tree (under the ran- 
dom permutation model) with n internal nodes — see, e.g., [fl4| , Section 6.2.2]. Moreover, 
it was shown by Regnier ]l5| that when X n is reinterpreted as the internal path length 
of an evolving random binary search tree after n keys have been inserted, the process 
(Y n )n>o is a martingale, which is L 2 -bounded and thus converges a.s. and in 1? to some 
limit Y . It follows that also Y n —* Y a.s., and thus in distribution; hence this random 
variable Y is (a realization of) the same Y as above. 

The martingale property can be written Y n = E(y„ + i \F n ), for the appropriate cr-field 
T n . Since x \— > e Xx is convex, it now follows by Jensen's inequality for conditional ex- 
pectations that e XYn < E(e Ay ™ +1 \T n )\ and thus, taking expectations, Ee Ay ™ < Ee Ay ™ +1 . 

By the same argument, Ee Ay " < Ee Ay for each n > 0, which together with Fatou's 



lemma yields Ee 



AY„ 



Ee 



AY 



as n 



oo. 



□ 



Corollary 7.2. For every n > 1, we have 



Ee 



AY„ 



f g 1.25[l+(l/n)] 2 A 2 5 
e 0.5[l+(l/n)] 2 A 2 
< { e [l+(l/n)] 2 A 2 ; 

e 12[l+(l/n)] 2 A 2 ; 
2el 1 +( 1 /")]A 



A < 0, 

-0.62 n/(n + 1) < A < 0, 
< A < 0.42n/(n + 1), 
< A < L n/(n+ 1), 
L n/(n + 1) < A. 



in particular, Ee Ay " < exp (max (12[1 + (l/n)] 2 A 2 , 2e [1+(1/n)]A )) /or a// A G R. 
Proof. XY n = \ n Y n with A n := [1 + (l/n)]A. 



□ 



Remark 7.3. The factors [l+(l/n)] in Corollary |7.2| are annoying but hardly important 
in applications. With some effort, we have been able to modify the proof in || and obtain 
for A > —0.58 the same estimates for Ee Ay ' 1 as obtained there for Ee Ay ; for A < —0.58 
we only obtain a slightly weaker bound, which for large n is inferior to the bound in 



Corollary 7.2. More precisely, we have shown 



r e i.34A 2 

„0.5A 2 



Ee 



AY„ 



< < 



,12A 2 



A < -0.58, 
-0.58 < A < 0, 
< A < 0.42, 
0.42 < A < L , 
L < A. 



(7.2) 



In particular, Ee Ay ™ < exp (max (12A 2 , 2e A )) for all A G R. In other words, we can 
eliminate the factors [1 + (1/n)] in Corollary [7.2| for A > —0.58 (and in particular for 
all positive A). Since the proof is quite long and the result only marginally improves 
Corollary |7.2| , we give the proof not here but rather in a separate appendix ||. 

It seems likely that with further effort one could remove the factor [1 + (1/n)] for 
A < —0.58 too, so that all the bounds in ( [7.1| ) also would bound e Ay ". Moreover, it 
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seems quite likely that E e Ay ™ < E e Ay holds for all A and n, and perhaps even that 
Ee Ay " | Ee^, as was proved for Y n in Theorem 7.1. 



Theorem 7.1 enables us to get an explicit constant in Rosier 's [16| large deviation 
bound. 



3eA + max(12A 2 ,2e A 



n 



2e\ 



Corollary 7.4. For any e > and A > 0, 

P(\X n - fj, n \ > efi n ) < 2exp 
Proof. By Markov's inequality, 

P(\X n - fi n \ > Sfi n ) = P(\Y n \ > e^ n /{n + 1)) 

< exp(-e\n n /(n + 1))E e A|y "! 

< exp(-e\fi n /(n + 1)) (Ee XYn + Ee~ Ay 



The result follows from Theorem |7.1| , since /i n /(n + 1) > 2H n — 4 > 2 Inn — 3 by (|L 
and (Oh. □ 



Corollary 7.5. For any fixed e > 0, 

P(\X n -n n \>Sfi n )<n- 2Mn+0 W, 



n > 2. 



Proof. Take (for n > 3) A = In Inn in Corollary 7.4 



□ 



Finally we show that the rate of convergence of the moment generating functions 
Ee Ay ™ to Ee Ay also is 0{n~ 1 / 2 ). (The same holds for Ee Ay ».) 

Theorem 7.6. For any fixed complex X, 

Be XY "=Be XY + 0(n~ 1 / 2 ). 

Explicitly, with Ai := Re(A), 



E e Ay ™ - E e Ay 



< 3 1 A | exp 



max 



24[l + (l/n)] 2 A 2 ,e 2 [ 1 +( 1 /^ Al 



n 



-1/2 



Proof. Consider a ^-optimal coupling of Y n and Y. Then, using the mean value theo- 
rem, the Cauchy-Schwarz inequality, Corollary and ([7.1|), 



Ee Ay »-Ee Ay 


< 


E 


e XY n _ 


e Ay 




< 


E 


(\Wn 


- Y 




< 


|A 


(e|y„ 


- Y 



1/2 



1/2 



< |A|d 2 (y„,y)(E e 2Aiy "+Ee 2Aiy ^ 

< V2\X\ exp max (24[1 + (l/n)] 2 Xle 2 ^ l ^ x A I d 2 (Y n ,Y). 



The result follws by Theorem 2.1 



□ 
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Remark 7.7. By Remark 7.3, the factors [l + (l/n)] can be eliminated in the statement 
of Theorem ffl]. 

Acknowledgment. We thank Anhua Lin and Ludger Ruschendorf for helpful dis- 
cussions. 
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